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AMENDMENTS TO THE C^ UMS 

The listing of claims below replaces all prior versions and listings of claims. 

1. (Canceled) 

2. (Canceled) 

3. (Canceled) 

4. (Canceled) 

5. (Currently amended) Th e m e thod of claim 1 A method for reducing the restart 
time for a parallel application, the parallel application including a plurality of p ^tel 
operators, the method comprising: 

repeating the following: 

setting a time interval to a next checkpoint: 
waiting until the time interval expires: 

sending checkpoint requests to each of the plurality of parallel operators: and 
receiving and processing messages from one or more of the plurality of parallel 
operators; 

wherein receiving and processing messages from one or more of the plurality of 
parallel operators comprises; 

receiving a checkpoint reject message from one of the plurality of parallel operators; 
sending abandon checkpointing messages to the plurality of parallel operators; and 
scheduling a new checkpoint. 

6. (Currently amended) The m e thod of olaim 1 A method for reducing the restart 
time for a parallel application the parallel application i ncluding a plurality of parallel 
operators, the method comprisin g; 

repeating the following; 

setting a time interval to a next checkpoint: 
waiting until the time interval expires: 

sending checkpoint requests to each of the pl urality of parallel op erators: anfl 
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receiving and processing messages from one or m ^re of the p lurality 0 f p^i-i 
Operators: 

wherein receiving and processing messages from one or more of the plurality of 
parallel operators comprises: 

receiving a recoverable error message from one or more of the plurality of parallel 
operators; 

sending abandon checkpointing messages to the plurality of parallel operators; 
waiting for ready messages from all of the plurality of parallel operators; and 
scheduling a new checkpoint. 

7. (Canceled) 

8. (Canceled) 

9. (Canceled) 

10. (Canceled) 

11. (Canceled) 

12. (Canceled) 

13. (Canceled) 

14. (Canceled) 

15. (Canceled) 

16. (Currently amended) The method of olaim 1 3 further comprbing A method 
for one of a plurality of parallel op erators to record ha state, the method comp rising: 

receiving a checkpoint request message on a control data stream; 

waiting to enter a state suitable for checkpointing ; 

sending a respons e wffiffipfl* on the control data stream: and 

determining that the parallel operator is not in a state suitable for checkpointing; 

EStlCf 

wherein sending a response message on the control data stream comprises sending 
a checkpoint reject message on the control data stream. 
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1 7. (Original) The method of claim 16 further comprising: 

experiencing a recoverable error; and wherein sending a response message on the 

control data stream comprises 
sending a recoverable error message on the control data stream. 

1 8. (Original) The method of claim 1 6 further comprising: 

experiencing a non-recoverable error; and wherein sending a response message on 

the control data stream comprises 
sending a non-recoverable error message on the control data stream. 

19. (Canceled) 

20. (Currently amended) Th e oomputor program of oloim 19 A computer 
program, st ored on a tangible storage medium, for use in reducing the restart time for a 
parallel application the parallel application comprising a plurality of parallel op erator*, 
the computer program comprising: 

a CRCF component which includes executable instructions that cause a computer tQ 
n?psat the foUpwing: 

set a time interval to a next checkpoint: 
wait until the time interval expires: 

send checkpoint requests to the plurality of parallel operators: 
receive and process messages from one or more of the plurality of parallel 
operators: and 

a plurality of parallel components, e ach of whic h is associated with one of the 

plurality of parallel operators, and each of which includes executable instructions 
that cause a computer to: 

receive a checkpoint request message from the CRCF: 
wait to enter a state suitable for checkpointing: and 
send a checkpoint response message to the CRCF: 
wherein 

each of the parallel components include executable instructions that cause a 
computer to: 
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determine that the parallel operator is not in a state suitable for checkpointing; 

and, in sending a response message to the CRCF, the parallel component 

associated with that parallel operator causes the computer to send a 

checkpoint reject message to the CRCF; and 
in receiving and processing messages from one or more of the plurality of parallel 

operators, the CRCF causes the computer to: 
receive the checkpoint reject message; and 

send abandon checkpoint messages to the plurality of parallel operators in 
response to the checkpoint reject message. 

21. (Currently amended) Th e comput e r program of claim 19 A compute 
program, stored on a tangible storage medium, for u se in reducing the restart time for a 
parallel application the parallel application comprising a plurality of parallel operators, 
the computer program comprising : 

a CRCF component which includes executable instructions that cause a computer to 
repeat the following: 

set a time interval to a next checkpoint: 
wait until the time interval expires: 

send checkpoint requests to the plurality of parallel operators; 
receive and process messages from one or more of the plurality of parallel 
operators: and 

a plurality of parallel components, each of which is associated with one of the 

plunftty of parallel operators* and each of which includes executable instructions 
that cause a computer to: 

receive a checkpoint request message from the CRCF: 

wait to enter a state suitable for checkpointing: and 

se nd a checkpoint response message to the CRCF: 
wherein 

each of the parallel components include executable instructions that cause a 
computer to: 

determine that one or more of the parallel operators has experienced a 
recoverable error; and, in sending a response message to the CRCF, 
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the parallel component or components associated with the one or more 
parallel operators that experienced the recoverable error or errors 
cause the computer to: 

send a recoverable error message to the CRCF; 

proceed with recovery; and 

send a ready message to the CRCF; and 
in receiving and processing messages from one or more of the plurality of 
parallel operators, the CRCF causes the computer to: 
receive the recoverable error message; 
send abandon checkpoint messages to the plurality of parallel 

operators in response to the recoverable error message; 
wait for the ready messages; 
receive the ready messages; and 
schedule a checkpoint. 



22. (Canceled) 

23. (Canceled) 

24. (Canceled) 

25. (Canceled) 

26. (Canceled) 
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